Search for: All records

Creators/Authors contains: "Carey, Michael J."

« Prev Next »

Total Resources

25

Resource Type
Conference Paper

10

Conference Proceeding

0

Dataset

0

Journal Article

15

Workshop Report

0

Availability
Full Text / Resource Available

24

Citation Only

1

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A new window Clause for SQL++

https://doi.org/10.1007/s00778-023-00830-z

Fang, James ; Lychagin, Dmitry ; Carey, Michael J. ; Tsotras, Vassilis J. ( December 2023 , The VLDB Journal)

Abstract
Window queries are important analytical tools for ordered data and have been researched both in streaming and stored data environments. By incorporating ideas for window queries from existing streaming and stored data systems, we propose a new window syntax that makes a wide range of window queries easier to write and optimize. We have implemented this new window syntax in SQL++, an SQL extension that supports querying semistructured data, on top of AsterixDB, a Big Data Management System, thus allowing us to process window queries over large datasets in a parallel and efficient manner.

more » « less
Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management Systems

https://doi.org/10.1145/3604437.3604460

Pavlopoulou, Christina ; Carey, Michael J. ; Tsotras, Vassilis J. ( June 2023 , ACM SIGMOD Record)

Effective query optimization remains an open problem for Big Data Management Systems. In this work, we revisit an old idea, runtime dynamic optimization, and adapt it to a big data management system, AsterixDB. The approach runs in stages (re-optimization points), starting by first executing all predicates local to a single dataset. The intermediate result created by a stage is then used to re-optimize the remaining query. This re-optimization approach avoids inaccurate intermediate result cardinality estimates, thus leading to much better execution plans. While it introduces overhead for materializing intermediate results, experiments show that this overhead is relatively small and is an acceptable price to pay given the optimization benefits.
more » « less
Free, publicly-accessible full text available June 7, 2024
Multi-valued indexing in Apache AsterixDB (SI DOLAP 2022)

https://doi.org/10.1016/j.is.2022.102144

Galvizo, Glenn ; Carey, Michael J. ( January 2023 , Information Systems)

Full Text Available
DynaHash: Efficient Data Rebalancing in Apache AsterixDB

https://doi.org/10.1109/icde53745.2022.00041

Luo, Chen ; Carey, Michael J. ( May 2022 , Proc. ICDE Conf.)

Full Text Available
JEDI: These aren't the JSON documents you're looking for...

https://doi.org/10.1145/3514221.3517850

Hütter, Thomas ; Augsten, Nikolaus ; Kirsch, Christoph M. ; Carey, Michael J. ; Li, Chen ( June 2022 , Proc. ACM SIGMOD Conf.)

Full Text Available
Subscribing to big data at scale

https://doi.org/10.1007/s10619-022-07406-w

Wang, Xikui ; Carey, Michael J. ; Tsotras, Vassilis J. ( April 2022 , Distributed and Parallel Databases)

Abstract
Today, data is being actively generated by a variety of devices, services, and applications. Such data is important not only for the information that it contains, but also for its relationships to other data and to interested users. Most existing Big Data systems focus onpassivelyanswering queries from users, rather thanactivelycollecting data, processing it, and serving it to users. To satisfy both passive and active requests at scale, application developers need either to heavily customize an existing passive Big Data system or to glue one together with systems likeStreaming EnginesandPub-sub services. Either choice requires significant effort and incurs additional overhead. In this paper, we present the BAD (Big Active Data) system as an end-to-end, out-of-the-box solution for this challenge. It is designed to preserve the merits of passive Big Data systems and introduces new features for actively serving Big Data to users at scale. We show the design and implementation of the BAD system, demonstrate how BAD facilitates providing both passive and active data services, investigate the BAD system’s performance at scale, and illustrate the complexities that would result from instead providing BAD-like services with a “glued” system.

more » « less
Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management Systems

Pavlopoulou, Christina ; Carey, Michael J. ; Tsotras, Vassilis J. ( March 2022 , 25th International Conference on Extending Database Technology)
null (Ed.)
Query Optimization remains an open problem for Big Data Management Systems. Traditional optimizers are cost-based and use statistical estimates of intermediate result cardinalities to assign costs and pick the best plan. However, such estimates tend to become less accurate because of filtering conditions caused either from undetected correlations between multiple predicates local to a single dataset, predicates with query parameters, or predicates involving user-defined functions (UDFs). Consequently, traditional query optimizers tend to ignore or miscalculate those settings, thus leading to suboptimal execution plans. Given the volume of today’s data, a suboptimal plan can quickly become very inefficient. In this work, we revisit the old idea of runtime dynamic optimization and adapt it to a shared-nothing distributed database system, AsterixDB. The optimization runs in stages (re-optimization points), starting by first executing all predicates local to a single dataset. The intermediate result created from each stage is used to re-optimize the remaining query. This re-optimization approach avoids inaccurate intermediate result cardinality estimations, thus leading to much better execution plans. While it introduces the overhead for materializing these intermediate results, our experiments show that this overhead is relatively small and it is an acceptable price to pay given the optimization benefits. In fact, our experimental evaluation shows that runtime dynamic optimization leads to much better execution plans as compared to the current default AsterixDB plans as well as to plans produced by static cost-based optimization (i.e. based on the initial dataset statistics) and other state-of-the-art approaches.
more » « less
Full Text Available
Exploratory Data Analysis with Database-backed Dataframes: A Case Study on Airbnb Data

https://doi.org/10.1109/bigdata52589.2021.9671603

Sinthong, Phanwadee ; Carey, Michael J. ( December 2021 , Proc. IEEE Int’l. Workshop on Benchmarking, Performance Tuning, and Optimization for Big Data Applications (BPOD))

Full Text Available
Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management Systems

Pavlopoulou, Christina ; Carey, Michael J. ; Tsotras, Vassilis J. ( January 2022 , EDBT)

Full Text Available
PolyFrame: a retargetable query-based approach to scaling dataframes

https://doi.org/10.14778/3476249.3476281

Sinthong, Phanwadee ; Carey, Michael J. ( July 2021 , Proceedings of the VLDB Endowment)
null (Ed.)
In the last few years, the field of data science has been growing rapidly as various businesses have adopted statistical and machine learning techniques to empower their decision-making and applications. Scaling data analyses to large volumes of data requires the utilization of distributed frameworks. This can lead to serious technical challenges for data analysts and reduce their productivity. AFrame, a data analytics library, is implemented as a layer on top of Apache AsterixDB, addressing these issues by providing the data scientists' familiar interface, Pandas Dataframe, and transparently scaling out the evaluation of analytical operations through a Big Data management system. While AFrame is able to leverage data management facilities (e.g., indexes and query optimization) and allows users to interact with a large volume of data, the initial version only generated SQL++ queries and only operated against AsterixDB. In this work, we describe a new design that retargets AFrame's incremental query formation to other query-based database systems, making it more flexible for deployment against other data management systems with composable query languages.
more » « less
Full Text Available

« Prev Next »